AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Vision-Language Large Model

# Vision-Language Large Model

Qwen.qwen2.5 VL 32B Instruct GGUF
Qwen2.5-VL-32B-Instruct is a 32B-parameter-scale multimodal vision-language model that supports joint understanding and generation tasks for images and text.
Image-to-Text
Q
DevQuasar
27.50k
1
Cephalo LaTeX Phi 3 Vision 128k 4b Beta
Apache-2.0
Cephalo is a series of vision-language large models focused on multimodal materials science. The current version specializes in converting mathematical formula images into LaTeX code.
Image-to-Text Transformers
C
lamm-mit
16
0
Somelvlm
Apache-2.0
SoMeLVLM is a large-scale vision-language model designed for social media processing.
Multimodal Fusion Transformers English
S
Lishi0905
25
2
Cogvlm Chat Hf
Apache-2.0
CogVLM is a powerful open-source vision-language model that achieves leading performance in multiple cross-modal benchmarks
Text-to-Image Transformers English
C
THUDM
4,816
193
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase